Goto

Collaborating Authors

 Embedded Systems


RankMap: Priority-Aware Multi-DNN Manager for Heterogeneous Embedded Devices

arXiv.org Artificial Intelligence

Modern edge data centers simultaneously handle multiple Deep Neural Networks (DNNs), leading to significant challenges in workload management. Thus, current management systems must leverage the architectural heterogeneity of new embedded systems to efficiently handle multi-DNN workloads. This paper introduces RankMap, a priority-aware manager specifically designed for multi-DNN tasks on heterogeneous embedded devices. RankMap addresses the extensive solution space of multi-DNN mapping through stochastic space exploration combined with a performance estimator. Experimental results show that RankMap achieves x3.6 higher average throughput compared to existing methods, while preventing DNN starvation under heavy workloads and improving the prioritization of specified DNNs by x57.5.


Fully Neural Network Based Speech Recognition on Mobile and Embedded Devices

Neural Information Processing Systems

Real-time automatic speech recognition (ASR) on mobile and embedded devices has been of great interests for many years. We present real-time speech recognition on smartphones or embedded systems by employing recurrent neural network (RNN) based acoustic models, RNN based language models, and beam-search decoding. The acoustic model is end-to-end trained with connectionist temporal classification (CTC) loss. The RNN implementation on embedded devices can suffer from excessive DRAM accesses because the parameter size of a neural network usually exceeds that of the cache memory and the parameters are used only once for each time step. To remedy this problem, we employ a multi-time step parallelization approach that computes multiple output samples at a time with the parameters fetched from the DRAM.


Vision Controlled Sensorized Prosthetic Hand

arXiv.org Artificial Intelligence

This paper presents a sensorized vision-enabled prosthetic hand aimed at replicating a natural hand's performance, functionality, appearance, and comfort. The design goal was to create an accessible substitution with a user-friendly interface requiring little to no training. Our mechanical hand uses a camera and embedded processors to perform most of these tasks. The interfaced pressure sensor is used to get pressure feedback and ensure a safe grasp of the object; an accelerometer is used to detect gestures and release the object. Unlike current EMG-based designs, the prototyped hand does not require personalized training. The details of the design, trade-offs, results, and informing the next iteration are presented in this paper.


Low-cost modular devices for on-road vehicle detection and characterisation

arXiv.org Artificial Intelligence

Detecting and characterising vehicles is one of the purposes of embedded systems used in intelligent environments. An analysis of a vehicle characteristics can reveal inappropriate or dangerous behaviour. This detection makes it possible to sanction or notify emergency services to take early and practical actions. Vehicle detection and characterisation systems employ complex sensors such as video cameras, especially in urban environments. These sensors provide high precision and performance, although the price and computational requirements are proportional to their accuracy. These sensors offer high accuracy, but the price and computational requirements are directly proportional to their performance. This article introduces a system based on modular devices that is economical and has a low computational cost. These devices use ultrasonic sensors to detect the speed and length of vehicles. The measurement accuracy is improved through the collaboration of the device modules. The experiments were performed using multiple modules oriented to different angles. This module is coupled with another specifically designed to detect distance using previous modules speed and length data. The collaboration between different modules reduces the speed relative error ranges from 1 to 5, depending on the angle configuration used in the modules.


A Micro Architectural Events Aware Real-Time Embedded System Fault Injector

arXiv.org Artificial Intelligence

In contemporary times, the increasing complexity of the system poses significant challenges to the reliability, trustworthiness, and security of the SACRES. Key issues include the susceptibility to phenomena such as instantaneous voltage spikes, electromagnetic interference, neutron strikes, and out-of-range temperatures. These factors can induce switch state changes in transistors, resulting in bit-flipping, soft errors, and transient corruption of stored data in memory. The occurrence of soft errors, in turn, may lead to system faults that can propel the system into a hazardous state. Particularly in critical sectors like automotive, avionics, or aerospace, such malfunctions can have real-world implications, potentially causing harm to individuals. This paper introduces a novel fault injector designed to facilitate the monitoring, aggregation, and examination of micro-architectural events. This is achieved by harnessing the microprocessor's PMU and the debugging interface, specifically focusing on ensuring the repeatability of fault injections. The fault injection methodology targets bit-flipping within the memory system, affecting CPU registers and RAM. The outcomes of these fault injections enable a thorough analysis of the impact of soft errors and establish a robust correlation between the identified faults and the essential timing predictability demanded by SACRES.


Difficulties in Dynamic Analysis of Drone Firmware and Its Solutions

arXiv.org Artificial Intelligence

With the advancement of Internet of Things (IoT) technology, its applications span various sectors such as public, industrial, private and military. In particular, the drone sector has gained significant attention for both commercial and military purposes. As a result, there has been a surge in research focused on vulnerability analysis of drones. However, most security research to mitigate threats to IoT devices has focused primarily on networks, firmware and mobile applications. Of these, the use of fuzzing to analyse the security of firmware requires emulation of the firmware. However, when it comes to drone firmware, the industry lacks emulation and automated fuzzing tools. This is largely due to challenges such as limited input interfaces, firmware encryption and signatures. While it may be tempting to assume that existing emulators and automated analysers for IoT devices can be applied to drones, practical applications have proven otherwise. In this paper, we discuss the challenges of dynamically analysing drone firmware and propose potential solutions. In addition, we demonstrate the effectiveness of our methodology by applying it to DJI drones, which have the largest market share.


Performance Tuning for GPU-Embedded Systems: Machine-Learning-based and Analytical Model-driven Tuning Methodologies

arXiv.org Artificial Intelligence

GPU-embedded systems have gained popularity across various domains due to their efficient power consumption. However, in order to meet the demands of real-time or time-consuming applications running on these systems, it is crucial for them to be tuned to exhibit high performance. This paper addresses the issue by developing and comparing two tuning methodologies on GPU-embedded systems, and also provides performance insights for developers and researchers seeking to optimize applications running on these architectures. We focus on parallel prefix operations, such as FFT, scan primitives, and tridiagonal system solvers, which are performance-critical components in many applications. The study introduces an analytical model-driven tuning methodology and a Machine Learning (ML)-based tuning methodology. We evaluate the performance of the two tuning methodologies for different parallel prefix implementations of the BPLG library in an NVIDIA Jetson system, and compare their performance to the ones achieved through an exhaustive search. The findings shed light on the best strategies for handling the open challenge of performance portability for major computational patterns among server and embedded devices, providing practical guidance for offline and online tuning. We also address the existing gap in performance studies for parallel computational patterns in GPU-embedded systems by comparing the BPLG performance against other state-of-the-art libraries, including CUSPARSE, CUB, and CUFFT.


Samsung debuts its own 'AI-powered' smart recipe app

Engadget

As it promised last week, Samsung has launched Food, a "personalized, AI-powered food and recipe" app in eight languages and 104 countries around the world. It draws on the food database of Whisk, an app Samsung acquired a few years back -- and resembles a version of Whisk the company revealed last year. Given Samsung's large presence in kitchens with its smart fridges and other appliances, the release of a food and recipe app seems a logical step for the company. The app allows users to search for recipes around the world, save them and make weekly eating plans. The company prepared over 160,000 recipes for launch, with that number set to increase down the road.


OmniBoost: Boosting Throughput of Heterogeneous Embedded Devices under Multi-DNN Workload

arXiv.org Artificial Intelligence

Modern Deep Neural Networks (DNNs) exhibit profound efficiency and accuracy properties. This has introduced application workloads that comprise of multiple DNN applications, raising new challenges regarding workload distribution. Equipped with a diverse set of accelerators, newer embedded system present architectural heterogeneity, which current run-time controllers are unable to fully utilize. To enable high throughput in multi-DNN workloads, such a controller is ought to explore hundreds of thousands of possible solutions to exploit the underlying heterogeneity. In this paper, we propose OmniBoost, a lightweight and extensible multi-DNN manager for heterogeneous embedded devices. We leverage stochastic space exploration and we combine it with a highly accurate performance estimator to observe a x4.6 average throughput boost compared to other state-of-the-art methods. The evaluation was performed on the HiKey970 development board.


This retailer is using RFID tags to make in-person clothes shopping less frustrating

ZDNet

Buying clothes in person can be a frustrating experience. You go to the fitting room, try on the item, and find you've picked the wrong size. You then have to get dressed, go back onto the shop floor, get the right-sized item, and go through the whole process again in the fitting room. Finally, you find the right item in the right size -- but now you have to wait in a long line to make your purchase. What you thought was going to be a quick and easy procedure has turned into a bit of a slog.